Skip to content

Backoff on ResourceExhausted when fetching Temporal worker deployment state#287

Closed
carlydf wants to merge 7 commits intomainfrom
backoff-on-resource-exhausted
Closed

Backoff on ResourceExhausted when fetching Temporal worker deployment state#287
carlydf wants to merge 7 commits intomainfrom
backoff-on-resource-exhausted

Conversation

@carlydf
Copy link
Copy Markdown
Collaborator

@carlydf carlydf commented Apr 22, 2026

What was changed and why

Problem

Namespaces with many TemporalWorkerDeployment objects can trigger Temporal's per-namespace DescribeWorkerDeployment rate limit (frontend.globalNamespaceWorkerDeploymentReadRPS, default 50 RPS). When this happens, the reconciler was returning the error immediately, causing the workqueue to requeue with an exponential backoff starting at ~5ms — effectively a tight retry loop that makes the rate-limit problem worse.

The error comes back as *serviceerror.ResourceExhausted (not a standard gRPC codes.ResourceExhausted status), so it must be detected with errors.As rather than grpcstatus.FromError.

Changes

  • internal/controller/worker_controller.go: detect *serviceerror.ResourceExhausted from GetWorkerDeploymentState and return RequeueAfter: 30s instead of an immediate error requeue. Sets ConditionProgressing=False with ReasonTemporalStateFetchFailed and a "Rate limited" message so the condition is visible to users.
  • internal/tests/internal/rate_limit_integration_test.go: new integration test that creates 10 TWDs against a 1 RPS limit, confirming the error surfaces with the expected condition reason and message.
  • go.mod / go.sum / go.work: bump Temporal server dependency to v1.31.0-154.2, which is the first version that enforces globalNamespaceWorkerDeploymentReadRPS.

Checklist

#278

  1. Closes

  2. How was this tested:

  • KUBEBUILDER_ASSETS=.../bin/k8s/1.27.1-darwin-arm64 go test -tags test_dep ./internal/tests/internal -run "TestIntegration/rate-limit" -timeout 120s -v passes
  • Same test fails when the errors.As block is removed (reverts to immediate retry with generic message)
  • Full integration suite passes: go test -tags test_dep ./internal/tests/internal -run TestIntegration -timeout 600s
  1. Any docs updates needed?

@carlydf carlydf closed this Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant